Preprint #04-9 REGRESSION ANALYSIS WITH LINKED DATA
نویسندگان
چکیده
Record linkage, or exact matching, can be used to join together two files that contain information on the same individuals, but lack unique personal identification codes. The possibility of errors in linkage causes problems for estimating the relationships between variables on the two files. The effect is analogous to the impact of measurement error. A model of a linear regression relationship between variables in linked files is proposed. Assuming the probabilities that pairs of records are links are known, an unbiased estimator of the regression coefficients is derived. Methods for estimating the linkage probabilities by using mixture models are discussed. A consistent estimator of the covariance matrix of the proposed estimator is proposed. A bootstrap estimator is used to reflect the impact of the uncertainty in record linkage model parameters on the estimators of the regression parameters. A simulation study compares the performance of the proposed estimator and alternatives.
منابع مشابه
Bayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data
This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...
متن کاملar X iv : h ep - p h / 99 04 36 2 v 1 1 6 A pr 1 99 9 Preprint SSU - HEP - 99 / 04
The proton structure and proton polarizability corrections to the Lamb shift of electronic hydrogen and muonic hydrogen were evaluated on the basis of modern experimental data on deep inelastic structure functions. Numerical value of proton polarizability contribution to (2P-2S) Lamb shift is equal to 4.4 GHz.
متن کاملThe Frequencies of three Factor IX-Linked Restriction Fragment Length Polymorphisms in Iranian Patients with Hemophilia B
Background: Hemophilia B is an X-linked recessive coagulation disorder caused by factor IX deficiency. Analysis of factor IX gene polymorphisms is considered the best approach for prenatal diagnosis and carrier detection of hemophilia B where the identification of gene mutation is not easily possible. Objective: To study the frequency of three factor IX-linked restriction fragment length polym...
متن کاملar X iv : h ep - e x / 04 11 06 5 v 2 2 6 Fe b 20 05 BELLE Belle Preprint 2004 - 34 KEK Preprint 2004 - 69 Observation of B + →
We report measurements of radiative B decays with Kηγ final states, using a data sample of 253 fb recorded at the Υ(4S) resonance with the Belle detector at the KEKB e+e− storage ring. We observe B+ → K+ηγ for the first time with a
متن کاملFactors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis
Background: Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predi...
متن کامل